Online Learning with Feedback Graphs: Beyond Bandits
نویسندگان
چکیده
We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multiarmed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T -round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with Θ̃(α1/2T 1/2) minimax regret, where α is the independence number of the underlying graph; the second class induces problems with Θ̃(δ1/3T 2/3) minimax regret, where δ is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time. Tel Aviv University, Tel Aviv, Israel, and Microsoft Research, Herzliya, Israel, [email protected]. Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy, [email protected]. Parts of this work were done while the author was at Microsoft Research, Redmond. Microsoft Research, Redmond, Washington; [email protected]. Technion—Israel Institute of Technology, Haifa, Israel, and Microsoft Research, Herzliya, Israel, [email protected]. Parts of this work were done while the author was at Microsoft Research, Redmond.
منابع مشابه
Active Search and Bandits on Graphs using Sigma-Optimality
Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...
متن کاملMulti-dueling Bandits with Dependent Arms
The dueling bandits problem is an online learning framework for learning from pairwise preference feedback, and is particularly wellsuited for modeling settings that elicit subjective or implicit human feedback. In this paper, we study the problem of multi-dueling bandits with dependent arms, which extends the original dueling bandits setting by simultaneously dueling multiple arms as well as m...
متن کاملReducing Dueling Bandits to Cardinal Bandits
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated prefer...
متن کاملGeneric Exploration and K-armed Voting Bandits
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of ...
متن کاملCorralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback and it is thus critical that the mas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015